On The Representation Of Query Term Relations By Soft Boolean Operators
نویسنده
چکیده
The l a n g u a g e a n a l y s i s component i n mos t t e x t r e t r i e v a l s y s t e m s i s c o n f i n e d to a r e c o g n i t i o n of noun p h r a s e s of t h e t y p e n o r m a l l y i n c l u d e d i n b a c k o f t h e b o o k i n d e x e s , and an i d e n t i f i c a t i o n of r e l a t e d t e r m s i n c l u d e d i n a p r e c o n s t r u c t e d t h e s a u r u s of q u a s i s y n o n y m s . Even such a r e s t r i c t e d l a n g u a g e a n a l y s i s i s f r a u g h t w i t h d i f f i c u l t i e s b e c a u s e of t h e w e l l k n o w n p r o b l e m s i n t h e a n a l y s i s of compound n o m i n a l s , and t h e h a z a r d s and c o s t of c o n s t r u c t i n g word synonym c l a s s e s v a l i d f o r l a r g e t e x t s a m p l e s . I n t h i s s t u d y an e x t e n d e d ( s o f t ) Boo l ean l o g i c i s used f o r t h e f o r m u l a t i o n of i n f o r m a t i o n r e t r i e v a l q u e r i e s which i s c a p a b l e of r e p r e s e n t i n g b o t h t h e u s e of compound noun p h r a s e s a s w e l l a s t h e i n c l u s i o n of synonym c o n s t r u c t i o n s i n t h e q u e r y s t a t e m e n t s . The o p e r a t i o n s of t h e e x t e n d e d B o o l e a n l o g i c a r e d e s c r i b e d , and e v a l u a t i o n o u t p u t i s i n c l u d e d to d e m o n s t r a t e t h e e f f e c t i v e n e s s of t h e e x t e n d e d l o g i c compared w i t h t h a t of o r d i n a r y t e x t r e t r i e v a l s y s t e m s . I . L i n g u i s t i c Approaches i n I n f o r m a t i o n R e t r i e v a l I t i s p o s s i b l e to c l a s s i f y t h e v a r i o u s a u t o m a t i c t e x t p r o c e s s i n g s y s t em s by t h e d e p t h and t y p e of l i n g u i s t i c a n a l y s i s needed f o r t h e i r o p e r a t i o n s . S o p h i s t i c a t e d l a n g u a g e u n d e r s t a n d i n g comp o n e n t s a r e b e l i e v e d to be e s s e n t i a l t o c a r r y o u t a u t o m a t i c t e x t t r a n s f o r m a t i o n s such as t e x t abstracting and text translation. [I,14,24] Complete language understanding systems are also needed in automatic question-answering where direct responses to user queries are automatically generated by t h e s y s t e m . [11 ] On t h e o t h e r h a n d , r e l a t i v e l y l e s s s o p h i s t i c a t e d l a n g u a g e a n a l y s i s s y s t e m s may be a d e q u a t e f o r b i b l i o g r a p h i c i n f o r m a t i o n r e t r i e v a l , where r e f e r e n c e s as opposed to d i r e c t a n s w e r s a r e r e t r i e v e d i n r e s p o n s e t o u s e r queries. [21] In bibllographic retrieval, the content of i n d i v i d u a l documents i s n o r m a l l y r e p r e s e n t e d by s e t s of key words , o r key p h r a s e s , and o n l y a few s p e c i f i e d t e rm r e l a t i o n s h i p s a r e r e c o g n i z e d u s i n g D e p a r t m e n t o t Computer S c i e n c e , C o r n e l l U n i v e r s i t y , I t h a c a , New York 14853. T h i s s t u d y was s u p p o r t e d i n p a r t by t h e N a t i o n a l S c i e n c e F o u n d a t i o n u n d e r g r a n t 1ST 8 3 1 6 1 6 6 . preconstructed dictionaries or thesauruses. Even in this relatively simplified environment one does not normally undertake a linguistic analysis of any scope. In fact, syntactic and semantic analysis have b e e n used in b i b l i o g r a p h i c information retrieval only under special circumstances to analyze query phrases [22], to process structured text samples of a certain kind, [7,15], or finally t o p r o c e s s t e x t s i n s e v e r e l y r e s t r i c t e d t o p i c areas. [2] Where s p e c i a l c o n d i t i o n s do n o t o b t a i n , t h e p r e f e r r e d a p p r o a c h i n i n f o r m a t i o n r e t r i e v a l h a s b e e n t o u s e s t a t i s t i c a l or p r o b a b i l i s t i c c r i t e r i a f o r t h e g e n e r a t i o n of t h e c o n t e n t i d e n t i f i e r s a s s i g n e d t o documen t s and s e a r c h q u e r i e s . O b v i o u s l y , n o t a l l t e r m s a r e e q u a l l y u s e f u l f o r c o n t e n t identification. Accordin E to the term discrimination theory, the following criteria are of importance i n t h i s c o n n e c t i o n [ 1 6 ] : a) t e r m s w h i c h o c c u r w i t h h i g h f r e q u e n c y i n t h e documen t s of a c o l l e c t i o n a r e n o t p r e f e r r e d f o r c o n t e n t r e p r e s e n t a t i o n b e c a u s e such t e r m s a r e t oo b r o a d t o d i s t i n g u i s h t h e documen t s f rom each o t h e r ; b) t e r m s wh ich o c c u r w i t h v e r y low f r e q u e n c y i n t h e c o l l e c t i o n a r e a l s o n o t o p t i m a l , b e c a u s e such t e r m s a f f e c t o n l y a v e r y s m a l l f r a c t i o n of d o c u m e n t s ; c) t h e b e s t t e r m s t e n d to be l o w t o m e d i u m f r e q u e n c y e n t i t i e s wh ich can be p r o d u c e d by taking single terms that exhibit the required frequency characteristics; alternatively, it is possible to obtain medium frequency entities by refining high frequency terms thereby rendering them more narrow, or by broadening low frequency terms. In many operational information situations, the term broadening and narrowing operations are effectively carried out by using formulations in which the terms are connected by Boolean operators. The use of Boolean logic in retrieval is discussed in more detail in the remainder of this note.
منابع مشابه
مدل جدیدی برای جستجوی عبارت بر اساس کمینه جابهجایی وزندار
Finding high-quality web pages is one of the most important tasks of search engines. The relevance between the documents found and the query searched depends on the user observation and increases the complexity of ranking algorithms. The other issue is that users often explore just the first 10 to 20 results while millions of pages related to a query may exist. So search engines have to use sui...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملImproved Skips for Faster Postings List Intersection
Information retrieval can be achieved through computerized processes by generating a list of relevant responses to a query. The document processor, matching function and query analyzer are the main components of an information retrieval system. Document retrieval system is fundamentally based on: Boolean, vector-space, probabilistic, and language models. In this paper, a new methodology for mat...
متن کاملFilter theory in MTL-algebras based on Uni-soft property
The notion of (Boolean) uni-soft filters in MTL-algebras is introduced, and several properties of them are investigated. Characterizations of (Boolean) uni-soft filters are discussed, and some (necessary and sufficient) conditions for a uni-soft filter to be Boolean are provided. The condensational property for a Boolean uni-soft filter is established.
متن کاملQuery expansion based on relevance feedback and latent semantic analysis
Web search engines are one of the most popular tools on the Internet which are widely-used by expert and novice users. Constructing an adequate query which represents the best specification of users’ information need to the search engine is an important concern of web users. Query expansion is a way to reduce this concern and increase user satisfaction. In this paper, a new method of query expa...
متن کاملGalois correspondence for counting quantifiers
We introduce a new type of closure operator on the set of relations, max-implementation, and its weaker analog max-quantification. Then we show that approximation preserving reductions between counting constraint satisfaction problems (#CSPs) are preserved by these two types of closure operators. Together with some previous results this means that the approximation complexity of counting CSPs i...
متن کامل